Attention Correctness in Neural Image Captioning

نویسندگان

  • Chenxi Liu
  • Junhua Mao
  • Fei Sha
  • Alan L. Yuille
چکیده

Attention Map Visualization We visualize the attention maps of both the implicit attention model and our supervised attention model on the Flickr30k test set. As mentioned in the paper, 909 noun phrases are aligned for the implicit model and 901 for the supervised model. 635 of these alignments are common for both, and 595 of them have corresponding bounding boxes. Here we present a subset due to space. For every figure, the original image is on the left, the implicit attention result is in the middle, and the supervised attention result is on the right. The red box marks the ground truth attention region, as annotated in the Flickr30k Entities dataset. The attention correctness score for this phrase is in the parenthesis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

Image captioning has been recently gaining a lot of attention thanks to the impressive achievements shown by deep captioning architectures, which combine Convolutional Neural Networks to extract image representations, and Recurrent Neural Networks to generate the corresponding captions. At the same time, a significant research effort has been dedicated to the development of saliency prediction ...

متن کامل

Seeing with Humans: Gaze-Assisted Neural Image Captioning

Gaze reflects how humans process visual scenes and is therefore increasingly used in computer vision systems. Previous works demonstrated the potential of gaze for object-centric tasks, such as object localization and recognition, but it remains unclear if gaze can also be beneficial for scene-centric tasks, such as image captioning. We present a new perspective on gaze-assisted image captionin...

متن کامل

Technical Report: Image Captioning with Semantically Similar Images

This report presents our submission to the MS COCO Captioning Challenge 2015. The method uses Convolutional Neural Network activations as an embedding to find semantically similar images. From these images, the most typical caption is selected based on unigram frequencies. Although the method received low scores with automated evaluation metrics and in human assessed average correctness, it is ...

متن کامل

Multimodal Attention for Neural Machine Translation

The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simult...

متن کامل

Image Captioning with Attention

In the past few years, neural networks have fueled dramatic advances in image classi cation. Emboldened, researchers are looking for more challenging applications for computer vision and arti cial intelligence systems. They seek not only to assign numerical labels to input data, but to describe the world in human terms. Image and video captioning is among the most popular applications in this t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017